fix: add Microsoft OSS compliance boilerplate#4
Merged
Conversation
Adds CODE_OF_CONDUCT.md, SECURITY.md, SUPPORT.md, LICENSE and Contributing + Trademarks sections to README.md, copied verbatim from microsoft/amplifier-core to bring this repo in line with the public Amplifier ecosystem standard. Generated by Amplifier ecosystem-audit recipe.
colombod
added a commit
that referenced
this pull request
Jun 18, 2026
…e-visibility signal (v4.0.1) (#15) * docs: dangling-node reader audit sign-off (#278 Phase 1 gate) Enumerate every node→edge reader in neo4j_store.py, services.py, and routers/ via the three spec-mandated grep commands. Classify each hit as TOLERANT or NEEDS-FIX. Confirm get_node (neo4j_store.py:566) and get_edge (neo4j_store.py:601) are SAFE independent point-lookups. All 7 grep hits are TOLERANT: - neo4j_store.py:616 get_edge() Cypher fallback — property-filtered edge lookup, not a node→edge walk; no node-existence dependency. - services.py:70,125-127,135,143 — GraphState in-memory dict operations; write paths (70,125-127,143) and direct key lookup (135); none walk node→edge. - routers/ — zero hits. NEEDS-FIX count: 0. No code changes required. Phase-1 gate PASSED. All other Phase-1 tasks may proceed. Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * feat: add neo4j_flush_chunk_rows/bytes config knobs (#278) Add two config knobs for sub-transaction chunking in _flush_body: - neo4j_flush_chunk_rows: int = 100 (cardinality bound) - neo4j_flush_chunk_bytes: int = 4_194_304 (4 MiB payload bound) A chunk closes when EITHER bound trips first. Tests verify defaults via test_neo4j_flush_chunk_rows_default and test_neo4j_flush_chunk_bytes_default. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * feat: thread flush_chunk_rows/bytes into Neo4jGraphStore.__init__ with clamp (#278) - Add flush_chunk_rows (default 100) and flush_chunk_bytes (default 4_194_304) to __init__ signature - Store as _flush_chunk_rows / _flush_chunk_bytes with max(1, value) clamp to prevent zero/negative chunks - Add _make_store_chunked helper and three new tests covering: nominal values, clamping of non-positive inputs, and default values 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * feat: add _serialized_row_size byte estimator (#278) Measures the JSON-serialized form of a row value (len(json.dumps(v, default=str))) rather than len() on the dict/list, which would return the element/key count and be blind to fat nested payloads such as large messages arrays or context_snapshot dicts. default=str ensures datetimes and other non-JSON-serialisable values never raise, falling back to str() length in the unlikely event json.dumps itself fails. Tests: - test_serialized_row_size_uses_serialized_form_not_len: fat dict with ~4000-char nested strings yields > 3000 (not 3 as len() would give) - test_serialized_row_size_handles_unjsonable_value: datetime value returns > 0, proving no crash on non-JSON types 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * feat: add _chunk_dict/_chunk_list dual-bound chunk helpers (#278) - Add _chunk_dict(snapshot, max_rows, max_bytes) generator that yields dict chunks bounded by both row count and byte size. - Add _chunk_list(snapshot, max_rows, max_bytes) generator for list payloads with the same dual-bound logic. - Both helpers implement the one-row floor: a single oversized row is always yielded alone, never split, never looped. - _serialized_row_size() used for byte estimation in both helpers. - 5 new tests cover: row bound, byte bound, one-row floor, empty input, and list variant. No row lost or duplicated. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * test: add capped Neo4j fixture + calibration guard for OOM proof (#278) Add module-scoped neo4j_container_capped fixture to conftest.py that runs neo4j:5.26.22-community with NEO4J_db_memory_transaction_max=2m, mirroring the session-scoped neo4j_container bootstrap logic (random ports, 5-attempt port-flake retry on APIError 'ports are not available', httpx readiness poll up to 60s, remove=True, container.stop() teardown). Cap is set via env at startup — runtime dbms.setConfigValue does not exist on Community Edition. Create tests/neo4j/test_oom_regression.py with: - _OOM_CODE module constant - _low_retry_store() helper: constructs Neo4jGraphStore, closes original 30s-retry driver (no leak), swaps in AsyncGraphDatabase.driver with max_transaction_retry_time=2.0 - _buffer_fat_nodes() helper: buffers n single-phase node rows with ~blob_bytes blob property and UNIQUE prefix-scoped node_ids - _purge_prefix() helper: DETACH DELETE for nodes under a prefix (order-independent) - test_calibration_guard_tiny_write_succeeds: buffers one tiny node, flushes (must not raise), asserts MATCH count == 1 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * test: store-level OOM RED on capped container (enormous=OOM, small=drain-red) (#278) Add two store-level OOM regression tests to test_oom_regression.py: test_unbounded_single_phase_flush_ooms: - Enormous flush_chunk_rows/flush_chunk_bytes (10M rows, 10GB bytes) - 400 fat nodes × 20 KB = ~8 MB single-phase payload, 4× over the 2 MiB cap - Asserts TransientError with code == Neo.TransientError.General.MemoryPoolOutOfMemoryError - Asserts MATCH count == 0 (nothing commits on OOM, buffer restored) test_chunked_flush_drains_same_single_phase_buffer (RED): - Small flush_chunk_rows=50, flush_chunk_bytes=262_144 (256 KB per chunk) - Same 400 fat nodes — each chunk is ~50 × 20 KB ≈ 1 MB, 4× UNDER the cap - Currently FAILS with TransientError/MemoryPoolOutOfMemoryError because _flush_body does not use flush_chunk_rows/flush_chunk_bytes yet - GREEN state (after Task 8 fix): flush() must not raise, buffer empty, count == 400 Also adds TransientError import from neo4j.exceptions. Test run (pre-fix): PASSED calibration_guard_tiny_write_succeeds PASSED test_unbounded_single_phase_flush_ooms (OOM confirmed, count == 0) FAILED test_chunked_flush_drains_same_single_phase_buffer (genuine RED) 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * feat: chunked phased _flush_body coordinator — fixes OOM stall (#278) Replace the single-transaction _flush_body with a phased, dual-bounded, per-chunk-committed coordinator that eliminates the MemoryPoolOutOfMemoryError caused by sending all buffered nodes/edges/patches in one transaction. Changes: - _flush_body now iterates each buffer through _chunk_dict/_chunk_list with self._flush_chunk_rows / self._flush_chunk_bytes bounds - Each chunk is committed in its own independent execute_write (separate Neo4j session) — no multi-chunk explicit transactions that would re-collapse the memory bound - Phase order: nodes → label patches → edges (preserves referential integrity) - On any chunk failure: logs flush_chunk_failed + re-raises; finally block merges snapshot back into live buffers (full retry on next flush) - _write_batch is byte-for-byte unchanged Test results: - tests/neo4j/test_oom_regression.py: 3/3 passed (calibration guard, enormous-bounds OOM cause asserted, small-bounds drains exactly 400 nodes) - tests/test_neo4j_store.py: 101/101 passed (no regressions) 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * test: coordinator within-bounds wiring guard + empty-buffer guard (#278) Add two characterization/guard tests for the phased chunked-flush coordinator (Task 8's _flush_body): - test_coordinator_every_execute_write_within_bounds: seeds 35 nodes at rows=10, captures every execute_write payload, and asserts each node chunk satisfies len(nodes)<=10 AND (total_bytes<=10_000_000 OR len==1). - test_coordinator_empty_buffer_makes_zero_calls: verifies that flush() with empty buffers short-circuits before opening a session, so execute_write is never called. Both tests pass against the existing coordinator implementation. Chunk-size arithmetic is owned by Task 5 tests and is not re-tested here. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * test: coordinator phase ordering nodes->patches->edges (#278 A.4) * test: re-raise invariant — 3 cases, full restore, ERROR log (#278 A.5) Add parametrized test test_reraise_restores_full_snapshot_and_logs covering 3 materially different durable-progress states: - first_chunk_fails (index 0): nothing committed to Neo4j - later_chunk_same_phase_committed (index 1): first node chunk committed, second node chunk fails — partial within node phase - edge_after_nodes_committed (index 3): all 3 node chunks committed, first edge chunk fails — partial durable progress across phases In all 3 cases asserts: 1. RuntimeError('chunk boom') propagates out of flush() 2. _node_buffer and _edge_buffer fully restored to original snapshot 3. ERROR log containing 'flush_chunk_failed' is emitted Also adds helpers: - _seq_execute_write_failing_on(call_index): execute_write mock that succeeds until call_index then raises RuntimeError('chunk boom') - _wire_session(store, execute_write_mock): wires fake session boundary (MagicMock cm / __aenter__ / __aexit__) onto store Hard constraint #4 guard: coordinator must re-raise on any chunk failure and never return success after a partial flush. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * feat: thread flush-chunk bounds from settings into get_or_create (#278) Pass flush_chunk_rows and flush_chunk_bytes from Settings into the Neo4jGraphStore constructor in get_or_create. Settings is already fetched at the top of get_or_create (line 434); the new fields reuse the same settings binding without a second get_settings() call. Also extend the _SettingsProxy in tests/conftest.py to expose the two new fields so the autouse safe_settings fixture does not cause AttributeError when the registry path exercises them in tests. Test: test_get_or_create_threads_flush_chunk_bounds monkeypatches Neo4jGraphStore and start_drain, constructs SessionRegistry, calls get_or_create, and asserts flush_chunk_rows==100 and flush_chunk_bytes==4_194_304 (the Settings defaults). 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * ops: set finite db.memory.transaction.max deployment cap (#278) * test: live finalization-path freeze->restart->drain arc, OOM cause asserted (#278) Three-leg integration test covering the _finalize_session failure path that manifests as frozen offsets across restarts (issue #278): Leg 1 OLD-FREEZE: rows=10_000_000 / byts=10_000_000_000 forces all ~201 fat nodes (~8 MB total) into a single transaction, hitting the 2 MiB per-transaction cap. Asserts: - 'finalize_tail_flush_failed' logged - _OOM_CODE ('Neo.TransientError.General.MemoryPoolOutOfMemoryError') positively present in caplog (not a proxy assertion) - Worker stays registered (finalize returned early without cleanup) - Committed offset frozen at 0 - 0 f-* ToolCall nodes committed Leg 2 RESTART: fresh SessionRegistry + SessionWorker over the same on-disk queue with the same old bounds. Confirms the freeze survives a process restart (offset still 0, 0 committed). Leg 3 RESTART FIXED: rows=50 / byts=262_144 keeps each chunk ~250 KB well under the 2 MiB cap. Asserts: - Offset advances to tail_end (full queue drained) - Worker deregistered on successful finalization - 100 f-* ToolCall nodes committed to Neo4j Queue seeding: 100 tool:pre events each carrying tool_call_id='f-{i}' and tool_input='x'*40_000 (fat ToolCall + Event nodes), plus one session:end event. Also adds _line() helper and top-level imports (json, logging, Path) to the module. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * test: live cross-chunk referential integrity + large-buffer happy path (#278) 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * feat: add SessionWorker.last_successful_flush liveness field (#278) Add last_successful_flush: float = field(default_factory=time.time) to the SessionWorker @DataClass in registry.py. The field defaults to the worker's creation time (NOT 0.0) so a brand-new worker reads as fresh, not ancient. A 0.0 default would make every new worker appear to have last flushed in 1970. Defaulting to creation time means 'no flush has happened yet, but the worker is fresh.' The field will be stamped in _flush_barrier in a subsequent task. TDD: test TestLastSuccessfulFlushField::test_defaults_to_creation_time_not_zero first failed with AttributeError ('SessionWorker' object has no attribute 'last_successful_flush'), then passed after the field was added. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * feat: stamp last_successful_flush once at the _flush_barrier boundary (#278) Phase 2 (#278): after awaited flush succeeds inside _flush_barrier, stamp worker.last_successful_flush = time.time(). All three drainer success paths (drain, exhausted-per-line, finalize) funnel through _flush_barrier, so this single stamp covers all of them. A separate stamp at each call site would be redundant and drift-prone. Test: TestFlushBarrierStampsLiveness.test_flush_barrier_advances_last_successful_flush - Forces worker.last_successful_flush = 0.0 before call - Asserts value >= before after _flush_barrier returns - Confirmed FAIL before fix (stays 0.0), PASS after 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * feat: add SessionRegistry.orphaned_sessions() predicate (#278) A worker is orphaned iff it is still registered in _workers AND its asyncio task has completed (task.done()). This catches the finalization-path orphan (tail flush returns early without deregistering, so the task completes but the worker remains) and any unhandled exception that escapes the drain loop. Deterministic and instant — no timer, no threshold. Three tests added (TestOrphanedSessions): - test_completed_task_worker_is_orphaned: done task → reported - test_live_task_worker_is_not_orphaned: running task → not reported - test_no_task_worker_is_not_orphaned: task=None → not reported 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * feat: surface orphaned + last_successful_flush on status response (#278) - Compute orphaned_ids set once from registry.orphaned_sessions() (single source of truth — no inline task.done() calls scattered through dict comp) - Add orphaned (bool) and last_successful_flush (float) to each per-session dict in build_status_response - Add top-level orphaned_sessions count (aggregate, safe for unauthenticated /status endpoint — no per-session error strings leaked) - Tests: TestBuildStatusResponseOrphanVisibility — 3 tests covering done-task → orphaned=True + count, running-task → orphaned=False + count=0, and last_successful_flush presence/value 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * test: live E2E — real finalization orphan surfaces on /status (#278) Adds tests/neo4j/test_orphan_visibility.py: a module-scoped live E2E test (pytestmark = pytest.mark.neo4j) proving a genuine finalization-path orphan surfaces on /status. The test drives the real start_drain → drain_worker → _finalize_session path — worker.task is an actual asyncio.Task that transitions to done(). Reproduction recipe (deterministic, ~29s against neo4j:5.26.22-community): Seed shape (line counts are load-bearing): Lines 1-99: tiny tool:pre (tool_input 'x'*16, key space small-{i}) Line 100: session:end (terminal; exactly fills read_batch max_items=100) Lines 101-200: fat tool:pre (tool_input 'x'*40_000, key space f-{i}, ~8 MB) WHY: The pre-terminal block is exactly 100 lines so the drainer's first read_batch returns only those, commits cleanly (tiny flush << 2 MiB cap), sets saw_terminal → _finalize_session. The finalization tail (100 fat lines) is flushed in ONE transaction (rows=10_000_000, byts=10_000_000_000) → OOM. _finalize_session does NOT retry; one OOM → finalize_tail_flush_failed log + early return → orphan (registered worker, completed task). Orphan post-state assertions (all required, none weakened): - worker.task.done() True - sid still in registry._workers (not deregistered) - 'finalize_tail_flush_failed' in caplog.text - _OOM_CODE ('Neo.TransientError.General.MemoryPoolOutOfMemoryError') in caplog.text - committed offset frozen at pre-terminal boundary (== boundary, != tail_end) - 0 f-* nodes committed (queried via a FRESH check_store) - worker in registry.orphaned_sessions() - build_status_response reports orphaned_sessions >= 1 and the session's per-session dict has orphaned: True Teardown closes the still-open store driver (early-returned _finalize_session did not call _safe_close). No production code changes. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * chore: bump version 4.0.0 -> 4.0.1 (#278 Phase 1 + Phase 2 release marker) - Add TestVersionIs401.test_pyproject_version_is_401 that reads pyproject.toml via tomllib and asserts version == '4.0.1' (single source of truth gate) - Bump pyproject.toml line 7: version = "4.0.0" -> version = "4.0.1" Test cycle confirmed: RED: assert '4.0.0' == '4.0.1' (AssertionError before bump) GREEN: 1 passed (after bump) 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * docs/refactor: apply pre-merge review fixes (#278) Three issues from the holistic code review, all cosmetic/documentary: 1. dashboard.py build_status_response docstring — add orphaned_sessions to the Returns key list; document the orphaned_sessions count vs visible- session asymmetry (aged-out orphan contributes to count but won't appear with orphaned:True in sessions list); add orphaned and last_successful_flush to the sessions-dict key list. 2. test_orphan_visibility.py — rename test function to align with the spec's acceptance-criteria reference: test_real_drain_orphan_surfaces_on_status -> test_finalization_orphan_surfaces_on_status Resolves the Task 7 DONE_WITH_CONCERNS naming discrepancy. 3. test_version.py — rename TestVersionIs401 -> TestVersionIs4_0_1 and test_pyproject_version_is_401 -> test_pyproject_version_is_4_0_1 to eliminate the HTTP-401-status-code ambiguity in the class name. Note: review recommendation #2 (strengthen flush-value assertion) was already implemented — tests/test_dashboard.py line 492-493 already carries both the key-presence check and the value-equality check. Non-Neo4j suite: 1343 passed, 2 skipped (no regressions). 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * chore: drop superpowers reader-audit doc from product docs/ (#278) The reader audit conclusion (zero dangling-node readers in the codebase) is captured in the PR description. Superpowers-generated docs belong outside the product documentation tree per repo conventions. Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> --------- Co-authored-by: Amplifier <amplifier@example.com> Co-authored-by: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds the standard Microsoft OSS compliance scaffolding to bring this repo in line with the public Amplifier ecosystem standard.
Files added
CODE_OF_CONDUCT.md— verbatim from microsoft/amplifier-coreSECURITY.md— verbatim from microsoft/amplifier-coreSUPPORT.md— verbatim from microsoft/amplifier-coreLICENSE— MIT, verbatim from microsoft/amplifier-coreREADME.md updates
## Contributingsection (verbatim from microsoft/amplifier-core)## Trademarkssection (verbatim from microsoft/amplifier-core)This is part of a coordinated cleanup across all 14 private
amplifier-*repos identified by the ecosystem audit on 2026-05-03. The same change is being applied uniformly to each — there is no per-repo customization.🤖 Generated with Amplifier